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(57) Abstract 

A code division multiple access (CDMA) communication system reduces system self-interference and enhances system capacity .by 
making rate selection decisions for individual speech encoders in concert with other speech encoders. The system utilizes perceptually 
n n^ hte ^ CrTOr metrics (401) as in P ut m{0 a rate c <> nlr oller (404) which determines and provides selected rates (402) back to the encoders 
UU5). The system provides optimum voice quality and system capacity in that it allows specific encoders to decrease their rate, which 
improves capacity, as necessary while allowing other encoders to maintain their rates. This prevents needless degradation in voice'quality 
at those times when system capacity needs to be temporarily increased. 
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Method And Apparatus For 
Group Encoding Signals 

5 Field of the Invention 

The present invention relates to communication systems 
utilizing code division multiple access (CDMA) techniques and 
. more specifically to variable rate speech encoding methods for 
1 0 reduction of system self-interference and enhancement of system 
capacity in point-multipoint multiple-access links with colocated 
digital speech encoding in such communication systems. 

1 5 Background of the Invention 

In recent years a variety of techniques have been used to 
provide multi-user mobile communications' within a limited 
available radio-frequency spectrum. These methods have 

2 0 included frequency division multiple access (FDMA), time 

division multiple access (TDMA), and code division multiple 
access (CDMA) or, more usually, hybrids of these methods. All of 
these methods have been employed within the past decade in the 
design of commercial cellular telecommunications systems: 

2 5 witness the use of FDMA in the North American AMPS system, 

FD/TDMA in the European Groupe Speciale Mobile (GSM) 
standard, and - more recently - the adoption of a direct sequence 
FD/CDMA approach by the United States Telecommunications 
Industry Association as embodied in its IS-95 standard. In the IS- 

3 0 95 standard, subscribers share one of several wideband radio 

channels in the cellular band. Several proposals for so-called 
personal communications systems (PCS) are also being designed 
on similar FD/CDMA principles. 
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Almost all recent cellular and PCS systems have ..used 
digital speech coding and forward channel error correction as the 
physical layer for voice communication. More interesting in this 
context, is the use of voice activity detection (VAD) to recognize 
5 the presence or absence of speech on the part of the either calling 
party. In the absence of speech, the speech encoder may instruct 
the modulator or transmitter to which it is linked to reduce its 
output power to zero, or transmit occasional packets of 
information describing only the background noise at the either 
1 0 user's location. Reducing the radio transmitter's duty cycle in this 
fashion provides the twin benefits of a reduction in power 
consumption (which increases battery life in the case of the mobile 
unit) and a reduction in interference between users sharing the 
same RF spectrum. Depending on the circumstances of the 

1 5 conversation, a reduction in transmitted power of between 40% 

and 65% can be achieved. The amount of power reduction is 
ultimately limited by the extent to which the degraded voice 
quality which accompanies significant VAD techniques is 
considered acceptable. 

2 0 The possibility of power reduction is particularly important 

for CDMA systems. In such systems, user capacity is inversely 
proportional to the amount of system self-interference. In the TIA 
IS-95 FD/CDMA standard, the approach is slightly broadened by 
the use of a variable rate speech encoder in place of simple on-Qff 

2 5 or discontinuous transmission methods. In the IS-95 standard, 

the encoded speech is separated into 20 ms intervals which the 
speech encoder may elect to encode at a effective bit rate of 8000 
bps, 4000 bps, 2000 bps, or 800 bps. Both the base-station to mobile 
station (forward) and mobile station to base-station (reverse) IS-95 

3 0 links exploit variable rate encoding. In the case of the forward 

link, mean transmit power is reduced by scaling down the output 
power as the encoded rate decreases. Channel symbol repetition 
allows symbol combining at the mobile receiver and hence 




maintenance of the energy per symbol to noise power spectral 
density ratio which determines link performance. It should be 
noted that mean transmit power - and hence system self- 
interference - is reduced by a factor of four during 800 bps 
5 transmission. By averaging over the aggregate voice activity for 
typical two-way conversations, it has been estimated that when 
using the standard speech encoding and voice activity detection 
algorithm defined in TIA standard IS-96 the mean transmit power 
will drop to around 41% of its nominal value. This has a 
1 0 significant effect on system forward link capacity. 

In current implementations, however, of the IS-95 air 
interface standard and its associated IS-96 speech encoder standard, 
each forward voice link is encoded in isolation. That is, speech 
encoders make individual determinations of the minimum 

1 5 encoded rate required to, maintain acceptable voice quality without 

regard to the other voice channels sharing the same RF spectrum. 
This requires that the rate-determination algorithm in each 
speech encoder should always minimize its encoded rate, even 
when the encoded rates of the other speech encoders sharing the 

2 0 same spectrum does not require this. For example, if all speech 

encoders sharing the same channel at a base-station should 
simultaneously seek to transmit at a low rate, the reduction in the 
total output power at the base-station means that each speech 
encoder could relax to the next higher rate at no risk to system 

2 5 capacity. Since minimizing the mean transmitted rate for a 

variable rate speech encoder requires that the voice quality be 
compromised, isolated speech encoding gives up voice quality 
needlessly. Also, as the CDMA system approaches capacity, and 
constraints are placed on the transmitted rate of each speech 

3 0 encoder in order to temporarily boost capacity, such constraints 

must be applied blindly, with all voice links subject to the same 
constraint irrespective of the effect on voice quality. This is 
wasteful, since it is known that voice quality depends on many 
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different factors and so opportunities will exist for reducing the 
rate of specific encoders with the least overall effect on the voice 
quality experienced within the sector/cell. 

Thus a need exists for a method and apparatus for global 
5 speech encoding on the forward link of an FD/CDMA system by 
making rate selection decisions for individual speech encoders in 
concert with all other speech encoders feeding the same sector/cell 
and RF channel. 

1 0 

Brief Description of the Drawings 

FIG. 1 generally depicts, in block diagram form, a prior art 
CDMA base-station transmitter. 

1 5 FIG. 2 generally depicts, in block diagram form, the prior art 

rate determination apparatus specified in speech encoding 
standard TIA IS-96. 

FIG. 3 generally depicts, in block diagram form, a codebook 
excited linear predictive speech encoder of the type utilized in the 

2 0 preferred embodiment and described in detail in speech encoding 

standard TIA IS-96. 

FIG. 4 generally depicts, in block diagram form, the. use of a 
supervising processor or rate controller to group speech encode in 
accordance with the invention. 

2 5 FIG. 5 generally depicts a rate/quality table utilized by the 

rate controller as part of a number of algorithms for optimizing 
the overall speech quality of a sector/cell subject to a constraint on 
transmitted power from the sector/cell in accordance with the 
invention. 

3 0 FIG. 6 generally depicts, in block diagram form, an 

apparatus for implementing the CDMA group encoding method 
in accordance with the invention. 




FIG. 7 generally depicts, in block diagram form, a- rate 
controller which may beneficially implement group encoding in 
accordance with the invention. 

5 

Detailed Description of a Preferred Embodiment 

A code division multiple access (CDMA) communication 
•system reduces system self-interference and enhances system 
1 0 capacity by making rate selection decisions for individual speech 
encoders in concert with other speech encoders. The system 
utilizes perpetually weighted error metrics (401) as input into a 
rate controller (404) which determines and provides selected rates 
(402) back to the encoders (105). The system provides optimum 

1 5 voice quality and system capacity in that it allows specific encoders 

to decrease their rate, which improves capacity, as necessary while 
allowing other encoders to maintain their rates. This prevents 
needless degradation in voice quality at those times when system 
capacity needs to be temporarily increased. 

2 0 The preferred embodiment of the invention is described as 

it relates to a CDMA digital cellular communications system based 
on the Telecommunications Industry Association standards IS-95 
and IS-96. It will be appreciated by one skilled in the art that the 
invention may be applied to any CDMA point-to-multipoint link 

2 5 (generally the forward link of a digital cellular system) in which 

self-interference reduction by variable rate speech encoding is to be 
applied. However, the technique discussed may be beneficially 
utilized in any communication system, and in fact is not restricted 
to communication systems. For example, the technique may be 

3 0 utilized where speech encoding occurs for storage in a memory 

means having limited memory space. In essence, the technique is 
applicable to any application where encoding (be it speech, video, 
data, etc.) is utilized and constraints related to the encoding (be it 
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power level, encoding quality, system capacity, memory space-,, etc.) 
are present. 

The method and apparatus group encodes signals by 
accepting rate determination information from at least two 
5 encoders and determining the rate of at least one encoder based on 
the rate determination information for the at least two encoders. 
In the preferred embodiment, rate determination information is 
quality information relating reconstruction quality as a function of 
encoding rate (on a 20 ms segment-by-20 ms segment basis). The 
1 0 quality information includes, but is not limited to, perceptual 
weighting error metrics generated by the analysis-by-synthesis 
speech encoders, signal-to-noise (S/N) ratio, segmented S/N, 
cepstral distance, an LPC distance measurement and a BARK 
spectral distance measurement, all of which are well known in the 

1 5 art. 

The determination of the rate of the at least one encoder is 
based on a threshold criterion (which is typically .predetermined). 
In the preferred embodiment, the threshold criterion may . include, 
but is not limited to, the total output power of the sector/cell to 

2 0 which the encoders are assigned, the total output power of an 

adjacent sector/cell, the current power level of transmission by a 
serving base-station, the current data rate of the at least two 
encoders, the memory available in a memory means, the 
processing power available in a processing means and the 

2 5 bandwidth available in a predetermined spectrum. Also in the 

preferred embodiment, the encoders are variable rate analysis-by- 
synthesis encoders. These encoders may encode signals including, 
but not limited to, speech signals, video signals and data signals. 

FIG. 1 shows the high-level architecture of the "forward link 

3 0 of a CDMA base-station (102) designed for the preferred 

embodiment of the TIA IS-95 digital cellular radio standard. The 
base-station (102) of FIG. 1 performs, inter alia, variable rate 
speech encoding, forward error correction, forward link power 




control, multiple access spreading, and modulation-- and 
transmission. In FIG. 1, several standard (i-law encoded, 
multiplexed, 64 kbps pulse code modulated (PCM) Tl links (101) 
from the public switched telephone network (PSTN) (100) are 
5 brought to a demultiplexer (103). Each 64 kbps voice link (104) is 
then passed through a digital speech encoder (105). In a 
conventional implementation, the speech encoding function is 
performed by a number of general purpose digital signal 
•processors (DSPs) such as the Motorola DSP56156 processor, ROM 
1 0 coded DSP's, or application specific integrated circuits (ASICs). 
Several such processors are generally grouped onto a single 
printed circuit board (although this is not necessary for the 
invention) which is then capable of processing a full Tl trunk of 
multiplexed voice channels. After speech encoding, error 

1 5 correction (106) is applied in the form of convolutional and cyclic 

codes, followed by BPSK baseband modulation (107), Walsh cover 
and short pseudo-noise (PN) sequence spreading (108), low-pass 
filtering (109), transmit power level adjustment (110) and power 
amplification (111), and finally transmission to the mobile station 

2 0 (113) (for simplicity, frequency shifting to RF is not shown). 

A block diagram of the TIA IS-96 standard processing 
performed by the DSP or other device used to implement the 
speech encoder (105) is shown in FIG. 2. As shown, speech encoder 
(105) can be broken down into two main elements: rate 

2 5 determination and encoding. Consider first the rate 

determination function. In the IS-96 standard, each speech 
encoder (105) divides its associated PCM signal stream into 
contiguous 20 ms frames consisting of 160 samples of the source 
speech waveform. The power level of each frame (which is the 

3 0 zeroth lag R(6) of the autocorrelation function estimate of the 

frame produced by the autocorrelation estimator (200)) is fed to a 
bank of comparators (203) which establish which of three 
monotonic-increasing threshold levels the frame power exceeds. 




These levels are generated by 2nd order interpolation of. a non- 
linear average of the power level of the speech signal formed by 
block (201). Note that all these processing steps are completely 
defined in TIA standard IS-96. If the current frame energy is less 
5 than the lowest of the three thresholds, the frame is declared an 
1/8 rate frame; if the frame energy lies between the lowest and 
middle of the thresholds, the frame is declared a 1/4 rate frame; if 
it is between the middle and highest threshold the declared a 1/2 
rate frame; and finally, if the frame energy exceeds the highest 
1 0 threshold level, the frame is declared full rate frame. This final 
itep is performed by comparators (203) and decoder (204) to 
produce the selected rate (205). 

The selected rate (205) is then input to the codebook excited 
linear predictive (CELP) speech encoding function (206) which 

1 5 forms a parametric description of the speech frame using the 

specified number of bits for that rate. In the preferred embodiment, 
the number of bits used to express the encoded parameters of an 
1/8 rate frame is 16 (ignoring additional bits used for error 
correction/detection); for a 1/4 rate frame, 40 bits; for a 1/2 rate 

2 0 frame, 72 bits; and for a full rate frame, 160 bits. While CELP is 

depicted and discussed in the preferred embodiment, other 
encoding techniques such as, inter alia, waveform coding, linear 
predictive coding (LPC), sub-band coding (SBC), code excited linear 
prediction (CELP), stochastically excited linear prediction (SELP), 

2 5 vector sum excited linear prediction (VSELP), improved multi- 

band excitation (IMBE), and adaptive differential pulse code 
modulation (ADPCM) coding algorithms may likewise be 
beneficially employed. 

For the purpose of clarity, it is necessary to describe in more 

3 0 detail the CELP speech encoding procedure. A high-level block 

diagram of the signal processing used in the CELP speech encoder 
of the preferred embodiment appears in FIG. 3. As shown in FIG. 
3, an estimate (309) of the autocorrelation function of consecutive 
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20 ms frames of the 64 kbps speech signal (104) is first obtained 
(this is usually done in common with block (200) of the rate 
determination procedure). Next, solution of the so-called Normal 
Equations using, for example, the Schur Recursion (301), provides 
5 the short term linear predictive (STP) filter coefficients (310). 
Often, the STP filter (303) is a lattice filter, and the STP coefficients 
are lattice filter reflection coefficients. After quantization (302) by 
line spectral pairing or some other robust quantization method, 
' the STP coefficients are used to filter the speech signal. The 
1 0 resulting signal is next passed to the long term prediction (LTP) 
filter (313) and (in the case of a CELP linear predictive coder) the 
codebook search procedure. The LTP filter is generally a first order 
recursive filter whose feedback delay and gain are variable - they 
appear in FIG. 3 as LTP lag L (304) and LTP gain G (305). Encoding 

1 5 then proceeds by simultaneously adjusting the LTP lag and gain 

and the codebook index I (312) so that the square error at the 
output of the LTP filter is minimized. L, G, and I are then 
quantized (often using simple biased linear quantizers methods), 
and passed along with the STP coefficients to the error correction 

2 0 block. The performance of this analysis-by-synthesis procedure can 

be improved by weighting the error metric which is to be 
minimized by the human auditory frequency response. This is 
done with a perceptual weighting filter (307) which modifies the 
error metric (308) to emphasize those frequency components to 

2 5 which the human ear is most sensitive. One skilled in the art will 

appreciate that the perceptually weighted error metric is made 
available by almost all sophisticated analysis-by-synthesis speech 
encoders. The present invention, as mentioned above, is therefore 
not limited exclusively to CELP speech encoders. 

3 0 With this background, group speech encoding in accordance 

with the invention may now be described. It is clear from FIG. 1 
and FIG. 2 that, in the prior art, the encoded rate of each forward 
link speech encoder is determined in isolation. That is, the 
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encoded rate of each 64 kbps voice link is determined exclusively 
by signal processing that speech signal. Since the amount of self- 
interference (and hence the capacity) in the forward link of a 
CDMA system depends on the mean encoded rate of each encoder, 
5 it is also clear that in order to operate at the maximum possible 
capacity, the rate determination algorithm of each speech encoder 
must be designed to always seek the minimum possible rate, since 
each encoder operates in isolation and has no knowledge of the 
total power (and hence system self-interference) being emitted at 
1 0 the base-station antenna (112). Since speech quality must be 
sacrificed to achieve low mean encoded rates, this implies that 
overall system speech quality is unnecessarily sacrificed when the 
system is not at its maximum capacity - or equivalently, is not 
transmitting its maximum allotted power. Put another way, 

1 5 isolated speech encoding allows the total instantaneous output 

power at the base-station to have a large variance. 

Since, in many CDMA power control algorithms, a strict 
limit is placed on total emitted power from a cell or sector, the rate 
used to encode individual links must be kept unnecessarily low. 

2 0 In addition, it is known that the perceptual quality of a digitally 

encoded voice link is dependent not only on the speech encoder 
being used, but also on factors such as the gender, accent, loudness 
of the speaker, and environmental factors such as type/levels of 
acoustic background noise. Thus, by encoding each link, in 

2 5 isolation, no recognition is made of situations where one link may 

be reduced in rate with a smaller loss in perceived overall voice 
quality than an equivalent reduction in rate on another link, and 
hence another speaker. Further, the current art embodied in TIA 
standard IS-96 makes no use of the perceptually-weighted 

3 0 encoding error in performing rate determination. 

The method shown in FIG. 4 can be used to overcome these 
deficiencies. In FIG. 4, each speech encoder (105) evaluates, for 
each 20 ms frame, the perceptually weighted error metric (401) 




produced by encoding the speech frame at each of the- four 
candidate rates (more than four rates may be possible in alternate 
embodiments). This information is then passed back to a 
supervising rate controller (404). Rate controller (404) then forms 
5 a rate/quality table similar to that of FIG. 5, which depicts the 
perceptually-weighted error produced by encoding at each of the 
candidate rates for each of the N speech encoders reporting to the 
rate controller. 

A simple approach to optimizing the overall voice quality 
1 0 of the cell or sector starts by assuming that all N voice channels 
have equal transmit power. All of the encoders (105) are placed in 
the lowest candidate rate and the total transmit power P is 
calculated by rate controller (404). In this case, P is simply equal to 
the sum of the rate values for all N encoders, where the rate value 

1 5 for 1/8 rate is 1/8, for 1/4 rate is 1/4, and so on. Rate controller 

(404) then finds the largest entry in the rate/quality table 
corresponding to the current candidate' rate for any of the N 
encoders. This is equivalent to identifying the' encoder with the 
worst voice quality (i.e. the largest perceptually weighted error) for 

2 0 the current set of selected rates. The rate for that encoder is 

increased to the next highest rate, and P is recalculated. This 
process continues until P exceeds some total power threshold T at 
which time the procedure terminates. An improved approach 
would be to apply the procedure to rate/quality table entries which 

2 5 have been weighted by the transmit gain associated with each 

encoder. This would be extracted from power level block (110). It 
will be appreciated by one of ordinary skill in the art that the 
overall effect of this procedure is to reduce power by sacrificing the 
rate of those encoders which will suffer the least reduction in 

3 0 quality by operating at a lower rate. 

A more complex approach would be as follows. Assume 
that, as above, the goal (i.e., the predetermined criterion) of the 
rate-reduction scheme during periods of high traffic loading is to 
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maintain the overall transmitted power to be less than ..some 
threshold T, where T is set according to the current load 
conditions. Define a global measure Q of speech quality for the 
sector/cell served by the base-station to be the sum of the 
5 perceptual errors for the current set of selected rates for the N 
voice channels. Each encoder is initialized to encode at the 
maximum rate. Q is then evaluated and the corresponding 
transmitted power calculated using either the equal power 
assumption or the weighted transmit power method described 
1 0 above. 

A simplification of this method would occur where the rate 
controller (404) was not available, but where each DSP was 
encoding several voice links by time-sharing its available 
computational resources. In that case, the rate selection procedure 

1 5 would be applied over the number of voice channels for which the 

DSP was performing encoding. FIG. 6 generally depicts an 
apparatus which may be used to implement this scenario. In FIG. 
6, a single DSP (603), such as the Motorola DSP56156, 
communicates via a time-division multiplexed serial bus or a 

2 0 conventional parallel address/data bus. Rate determination 

information and rate selections are passed between the controlling 
DSP (603) and the DSP's (602) used for speech encoding via bus 
(604). Alternatively, the controlling DSP (603) may be eliminated 
and one of the encoder DSP's (602) promoted to fulfill the global 

2 5 rate controller function and speech encoding for one or more 

voice channels. 

FIG. 7 generally depicts, in block diagram form, a rate 
controller (404) which may beneficially implement group encoding 
in accordance with the invention. Rate controller (404) comprises 

3 0 means for accepting (700) rate determination information (401) 

from a plurality of encoders (105). In the preferred embodiment, 
rate determination information is quality information which 
includes a perceptually weighted error metric. Means for accepting 
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(700) has its output entering means for determining (703) -which 
determines encoding requirements based on predetermined 
criterion. The predetermined criterion include those stated above 
as threshold criterion. The output of means for determining (703) 
is input into means for adjusting (706) which adjusts the encoding 
rate for any encoder out of the plurality of encoders based on the 
rate determination information and the predetermined criterion. 
In a scenario where the predetermined criterion is total transmit 
power or available memory space, means for adjusting (706) will 
typically increase the encoding rate for the encoder having the 
worst quality (based on a determination/ calculation of either total 
transmit power or available memory space and a threshold) as 
described above. However, certain predetermined criterion, such 
as system capacity, may require means for adjusting (706) to 
decrease the encoding rate for a particular encoder. 

While- the invention has been particularly shown and 
described with reference to a particular embodiment, it will be 
understood by those skilled in the art that various changes in form 
and details may be made therein without departing from the spirit 
and scope of the invention. 

What I claim is: 
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Claims 

1. A method of group encoding signals comprising the steps 
of: 

5 

accepting rate determination information from a plurality 

of encoder- 
determining encoding requirements based on 

predetermined criterion; and 
1 0 adjusting the encoding rate for any encoder out of the 

plurality of encoders based on the rate determination information 

and the predetermined criterion. 

2. The method of claim 1 wherein the predetermined 

1 5 criterion further comprises either the total output power of the 

sector/cell to which the encoders are assigned, the total output 
power of an adjacent sector/cell, the current .power level of 
transmission by a serving base-station; the current data rate of the 
at least two encoders, the memory available in a memory means, 

2 0 the processing power available in a processing means and the 

bandwidth available in a predetermined spectrum. 

3. An apparatus for group encoding signals, the apparatus 
comprising: 

25 

means for accepting rate determination information from at 
least two encoders; and 

means, coupled to the means accepting, for determining the 
rate of at least one encoder based on the rate determination 

3 0 information for the at least two encoders. 

4. The apparatus of claim 3 wherein the at least two encoders 
further comprise analysis-by-synthesis encoders. 
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5. The apparatus of claim 4 wherein the rate determination 
information further comprises quality information generated by 
the at least two encoders. 

5 

6. The apparatus of claim 5 wherein the quality information 
further comprises perceptual weighting error metrics generated by 
the analysis-by-synthesis speech encoders, signal-to-noise (S/N) 
ratio, segmented S/N, cepstral distance, an LPC distance 

1 0 measurement and a BARK spectral distance measurement. 

7. The apparatus of claim 3 wherein the means for 
determining the rate of the at least one encoder further comprises 
means for determining the rate of the at least first encoder based 

15 on a threshold criterion. 4 

8. The apparatus of claim 7 wherein the threshold criterion 
further comprises either the total output power of the sector/cell 
to which the encoders are assigned, the total output power of an 

2 0 adjacent sector/cell, the current power level of transmission by a 

serving base-station, the current data rate of the at least two 
encoders, the memory available in a memory means, the 
processing power available in a processing means and the 
bandwidth available in a predetermined spectrum. 

25 

9. The apparatus of claim 3 wherein the at least two encoders 
further comprise at least two variable rate encoders. 

10. The apparatus of claim 3 wherein the signals further 

3 0 comprise either speech signals, video signals or data signals. 
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